Search CORE

7,422 research outputs found

Moses-based official baseline for NEWS 2016

Author: Ruiz Costa-Jussà Marta
Publication venue
Publication date: 01/01/2016
Field of study

Transliteration is the phonetic translation between two different languages. There are many works that approach transliteration using machine translation methods. This paper describes the official baseline system for the NEWS 2016 workshop shared task. This baseline is based on a standard phrase-based machine translation system using Moses. Results are between the range of best and worst from last year’s workshops providing a nice starting point for participants this year.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC

Ongoing study for enhancing chinese-spanish translation with morphology strategies

Author: Ruiz Costa-Jussà Marta
Publication venue
Publication date: 01/01/2015
Field of study

Chinese and Spanish have different morphology structures, which poses a big challenge for translating between this pair of languages. In this paper, we analyze several strategies to better generalize from the Chinese non-morphology-based language to the Spanish rich morphologybased language. Strategies use a first-step of Spanish morphology-based simplifications and a second-step of fullform generation. The latter can be done using a translation system or classification methods. Finally, both steps are combined either by concatenation in cascade or integration using a factored-based style. Ongoing experiments (based on the United Nations corpus) and their results are described.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC

How much hybridisation does machine translation need?

Author: Ruiz Costa-Jussà Marta
Publication venue: 'Wiley'
Publication date: 01/01/2015
Field of study

This is the peer reviewed version of the following article: [Costa-jussà, M. R. (2015), How much hybridization does machine translation Need?. J Assn Inf Sci Tec, 66: 2160–2165. doi:10.1002/asi.23517], which has been published in final form at [10.1002/asi.23517]. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Self-Archiving.Rule-based and corpus-based machine translation (MT)have coexisted for more than 20 years. Recently, bound-aries between the two paradigms have narrowed andhybrid approaches are gaining interest from bothacademia and businesses. However, since hybridapproaches involve the multidisciplinary interaction oflinguists, computer scientists, engineers, and informa-tion specialists, understandably a number of issuesexist.While statistical methods currently dominate researchwork in MT, most commercial MT systems are techni-cally hybrid systems. The research community shouldinvestigate the bene¿ts and questions surrounding thehybridization of MT systems more actively. This paperdiscusses various issues related to hybrid MT includingits origins, architectures, achievements, and frustra-tions experienced in the community. It can be said thatboth rule-based and corpus- based MT systems havebene¿ted from hybridization when effectively integrated.In fact, many of the current rule/corpus-based MTapproaches are already hybridized since they do includestatistics/rules at some point.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Why Catalan-Spanish Neural Machine Translation? Analysis, comparison and combination with standard Rule and Phrase-based technologies

Author: Ruiz Costa-Jussà Marta
Publication venue
Publication date: 01/01/2017
Field of study

Catalan and Spanish are two related languages given that both derive from Latin. They share similarities in several linguistic levels including morphology, syntax and semantics. This makes them particularly interesting for the MT task. Given the recent appearance and popularity of neural MT, this paper analyzes the performance of this new approach compared to the well-established rule-based and phrase-based MT systems. Experiments are reported on a large database of 180 million words. Results, in terms of standard automatic measures, show that neural MT clearly outperforms the rule-based and phrase-based MT system on in-domain test set, but it is worst in the out-of-domain test set. A naive system combination specially works for the latter. In-domain manual analysis shows that neural MT tends to improve both adequacy and fluency, for example, by being able to generate more natural translations instead of literal ones, choosing to the adequate target word when the source word has several translations and improving gender agreement. However, out-of-domain manual analysis shows how neural MT is more affected by unknown words or contexts.Postprint (published version

Crossref

UPCommons. Portal del coneixement obert de la UPC

Domain adaptation strategies in statistical machine translation: a brief overview

Author: Ruiz Costa-Jussà Marta
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2015
Field of study

© Cambridge University Press, 2015.Statistical machine translation (SMT) is gaining interest given that it can easily be adapted to any pair of languages. One of the main challenges in SMT is domain adaptation because the performance in translation drops when testing conditions deviate from training conditions. Many research works are arising to face this challenge. Research is focused on trying to exploit all kinds of material, if available. This paper provides an overview of research, which copes with the domain adaptation challenge in SMT.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Description of the Chinese-to-Spanish rule-based machine translation system developed with a hybrid combination of human annotation and statistical techniques

Author: Centelles Jordi
Ruiz Costa-Jussà Marta
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

Two of the most popular Machine Translation (MT) paradigms are rule based (RBMT) and corpus based, which include the statistical systems (SMT). When scarce parallel corpus is available, RBMT becomes particularly attractive. This is the case of the Chinese--Spanish language pair. This article presents the first RBMT system for Chinese to Spanish. We describe a hybrid method for constructing this system taking advantage of available resources such as parallel corpora that are used to extract dictionaries and lexical and structural transfer rules. The final system is freely available online and open source. Although performance lags behind standard SMT systems for an in-domain test set, the results show that the RBMT’s coverage is competitive and it outperforms the SMT system in an out-of-domain test set. This RBMT system is available to the general public, it can be further enhanced, and it opens up the possibility of creating future hybrid MT systems.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

WMT 2016 Multimodal translation system description based on bidirectional recurrent neural networks with double-embeddings

Author: Rodríguez Guasch Sergio
Ruiz Costa-Jussà Marta
Publication venue
Publication date: 01/01/2016
Field of study

Bidirectional Recurrent Neural Networks (BiRNNs) have shown outstanding results on sequence-to-sequence learning tasks. This architecture becomes specially interesting for multimodal machine translation task, since BiRNNs can deal with images and text. On most translation systems the same word embedding is fed to both BiRNN units. In this paper, we present several experiments to enhance a baseline sequence-to-sequence system (Elliott et al., 2015), for example, by using double embeddings. These embeddings are trained on the forward and backward direction of the input sequence. Our system is trained, validated and tested on the Multi30K dataset (Elliott et al., 2016) in the context of theWMT 2016Multimodal Translation Task. The obtained results show that thedouble-embedding approach performs significantly better than the traditional single-embedding one.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC

Evaluating the Underlying Gender Bias in Contextualized Word Embeddings

Author: Basta Christine
Casas Noe
Costa-jussà Marta R.
Publication venue
Publication date: 01/01/2019
Field of study

Gender bias is highly impacting natural language processing applications. Word embeddings have clearly been proven both to keep and amplify gender biases that are present in current data sources. Recently, contextualized word embeddings have enhanced previous word embedding techniques by computing word vector representations dependent on the sentence they appear in. In this paper, we study the impact of this conceptual change in the word embedding computation in relation with gender bias. Our analysis includes different measures previously applied in the literature to standard word embeddings. Our findings suggest that contextualized word embeddings are less biased than standard ones even when the latter are debiased

arXiv.org e-Print Archive

Crossref

UPCommons. Portal del coneixement obert de la UPC

Enhancing Word Embeddings with Knowledge Extracted from Lexical Resources

Author: Biesialska Magdalena
Costa-jussà Marta R.
Rafieian Bardia
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

In this work, we present an effective method for semantic specialization of word vector representations. To this end, we use traditional word embeddings and apply specialization methods to better capture semantic relations between words. In our approach, we leverage external knowledge from rich lexical resources such as BabelNet. We also show that our proposed post-specialization method based on an adversarial neural network with the Wasserstein distance allows to gain improvements over state-of-the-art methods on two tasks: word similarity and dialog state tracking.Comment: Accepted to ACL 2020 SR

arXiv.org e-Print Archive

Crossref

UPCommons. Portal del coneixement obert de la UPC

Stages for the More Sustainable Farm

Author: Marta-Costa Ana Alexandra
Poeta A.
Publication venue
Publication date
Field of study

Currently, agricultural farm units are faced with a double and most times contradictory challenge, in order to be successful: on the one hand the invested capital has to be profitable and the economic performance has to be maximised. On the other hand, given the socio-environmental situation, it is necessary to preserve and to protect the environment and natural resources. Given the potential conflict of the two aims, since the satisfaction of one implies the underperformance of the other (and vice versa), the question then is: which is the solution to choose? We intend, in this work, to formulate a farm plan with the purpose of reconciling the criteria of environmental sustainability with that of economic competitiveness. For this achievement we proceed to the comparative study of sustainability of different groups of farms identified in the study area (first evaluation cycle) through MESMIS (“Marco para la Evaluación de Sistemas de Manejo de Recursos Naturales Mediante Indicadores de Sustentabilidad” - Framework for Evaluation of Natural-Resource Systems Handling through Sustainability Indicators) methodology, that allowed to select the more sustainable group of farms. Based on the found potentialities and weakness on these production systems, we stepped to the planning of a production unit of bovine meat, which obeys simultaneously to economic and environmental objectives, using Multicriteria Decision. We finished the work with the sustainability evaluation between groups of farms identified previously and the planned farms (second evaluation cycle), based, again, in the MESMIS methodology, to confirm (or not) the greatest sustainability of the last ones. Analyses of the results allow us to confirm the greatest relative sustainability of the planned farm, for the diverse traced scenarios.Decision taking, planning, sustainability, Environmental Economics and Policy, Farm Management,

Research Papers in Economics